現代網路設計以及邏輯可視化

林嶔 (Lin, Chin)

Lesson 9

前言

– 梯度消失問題似乎已經被Residual Learning解決了,但事情有這麼簡單嗎?

– 權重初始化問題關係著局部極值,目前除了優化器及參數的選擇外,只有轉移特徵學習能夠使用,因此「找到好的轉移特徵學習方法」相當重要!

– 過度擬合問題有大量的方法可以解決,但他的根本在於「待解參數量」遠大於「數據量」,因此我們有可有可能設計一個「小參數量」但又足夠複雜(深)的模型呢?

– 讓我們跟著ILSVRC的腳步來學習,並看看人類是怎樣一步一步突破的

F9_1

– 這是ILSVRC的歷屆冠軍模型,隨著時間的推移,我們看看Model Architecture的設計觀念是如何演進的!

F9_2

大型網路架構的演進(1)

AlexNet
Operator Kernel Stride Filter Group Input size Parameter size
CONV + ReLU + LRN 11 4 96 2 224 * 224 * 3 11 * 11 * 3 * 96 / 2 ~ 17K
Max Pool 3 2 56 * 56 * 96
CONV + ReLU + LRN 5 1 256 2 28 * 28 * 96 5 * 5 * 96 * 256 / 2 ~ 307K
Max Pool 3 2 28 * 28 * 256
CONV + ReLU 3 1 384 1 14 * 14 * 256 3 * 3 * 256 * 384 / 1 ~ 884K
CONV + ReLU 3 1 384 2 14 * 14 * 384 3 * 3 * 384 * 384 / 2 ~ 664K
CONV + ReLU 3 1 256 2 12 * 12 * 384 3 * 3 * 384 * 256 / 2 ~ 442K
Max Pool 3 2 12 * 12 * 256
FC + ReLU 4096 6 * 6 * 256 6 * 6 * 256 * 4096 ~ 37749K
FC + ReLU 4096 4096 4096 * 4096 ~ 16777K
FC + Softmax 1000 4096 4096 * 1000 ~ 4096K

大型網路架構的演進(2)

F9_3

大型網路架構的演進(3)

  1. 權重初始化問題 - 由於他拿不到比Imagenet還大量的資料,因此他沒有考慮這個問題

  2. 梯度消失問題 - 他使用了ReLU作為非線性轉換函數,考慮到他只有8層深,問題應該不大

  3. 過度擬和問題 - 整個網路的待解參數共計6100萬,而Imagenet也只有130萬的訓練資料,因此他用了極複雜的資料擴增技術以避免過度擬和問題,同時他也設計了Dropout來解決這個問題

大型網路架構的演進(4)

  1. 待解參數量似乎多了點,而且主要集中在全連接層的部分(~97%),而卷積網路在圖像識別上的成功是因為他很像生物的視覺機制,因此應該更強調卷積層的堆疊而捨棄全連接層

– 因此在GoogleNet(Inception net v1)於2014年奪冠時,當時的網路已經完全捨棄了全連接層,完全使用卷積層來抽取圖像特徵,你可以參考Going Deeper with Convolutions,因此參數量得到大幅地降低

  1. AlexNet用的卷積核尺寸不一,這導致整個網路的發展相當困難,所以我們必須好好解決這個問題

– 這個問題在Network In Network以及Very Deep Convolutional Networks for Large-Scale Image Recognition的努力之下,基本上確定了以後的卷積核尺寸是以3×3以及1×1為主

大型網路架構的演進(5)

大型網路架構的演進(6)

– 這個是原始版的Inception module,讓我們試著計算他需要多少待解參數:

F9_4

  1. 我們先計算一下1×1的部分,他需要1×1×256×128個參數

  2. 接著計算一下3×3的部分,他需要3×3×256×192個參數

  3. 接著計算一下5×5的部分,他需要5×5×256×96個參數

大型網路架構的演進(7)

F9_5

  1. 1×1的部分並沒有改變,因此仍然需要1×1×256×128個參數

  2. 接著計算一下3×3的部分,他的第一層需要1×1×256×64,第二層需要3×3×64×192個參數

  3. 接著計算一下5×5的部分,他的第一層需要1×1×256×64,第二層需要5×5×64×96個參數

  4. 最後計算Pooling的部分,他需要1×1×256×64個參數

– 這個結構叫做bottleneck,這是因為特徵圖數量由多變少,再由少變多,這個結構之後被大量運用於減少待解參數量。

– 需要注意的是,壓縮得太多會損害網路準確性,一般來說壓縮結構大多不會壓縮超過4倍(這裡是3倍)

大型網路架構的演進(8)

– 由於Batch normalization經過大量的研究被證實了他強大的效果,這使的傳統的卷積單元CONV+ReLU,變成CONV+BN+ReLU(或是BN+ReLU+CONV)。

– 值得一提的是,由於這一系列的演進,網路雖然越來越深但待解參數不斷減少,因此從這時候開始Dropout就很少使用了,如果有用也頂多在最後一層輸出前才使用。

大型網路架構的演進(9)

– 關於ResNet的論文細節請參考Deep Residual Learning for Image Recognition

F9_6

F9_7

大型網路架構的演進(10)

– 另外之後提出的DenseNet,他提出了一個很重要的Idea,那就是我們有沒有可能對簡單的圖像使用淺層特徵預測,而深層的網路用來預測困難的圖像,我們可以在論文中找到細節:Densely Connected Convolutional Networks

F9_8

F9_9

– 我們採用具有bottleneck的ResNet:首先先經過一個1×1卷積核,通常壓縮四倍,故參數總量為1×1×1024×256;接著經過一個3×3卷積核,參數總量為3×3×256×256;接著再經過一個1×1卷積核,由於必須將維度還原回原始輸入,因此參數總量為1×1×256×1024。整個Module共需1114112個參數!

– 至於DenseNet中要求輸出固定為16維,而一般來說會先經過一個1×1卷積核,我們來計算一下參數量:首先先經過一個1×1卷積核,通常是3×3維度的四倍,故參數總量為1×1×1024×64;接著經過一個3×3卷積核,參數總量為3×3×64×16。整個Module共需74752個參數!

大型網路架構的演進(11)

  1. 在總參數量相當的狀況下:深 + 窄 > 淺 + 寬

  2. 1×1的卷積核不但能增加卷積核的非線性表達能力,更可以用來壓縮參數

  3. 整個網路通常以Module的形式在中間進行循環使用,而因此關鍵變為設計好的Module

  4. Module與Module之間的連結一定要使用Residual Connection或Dense Connection,避免梯度消失

  5. 盡量讓整個網路都使用卷積層,而不要使用全連接層以及Pooling

  6. 網路的特徵數必須隨著層數堆疊而由少至多

  7. 最後不要忘記上一節課提到的Squeeze-and-Excitation mechanism,他能用極少的參數量作為代價大幅提升模型性能

– 因此大型網路的精準度已經不需要懷疑了,只要你擁有有足夠大的數據外加足夠深的網路。

– 現在的重點在於,我們有沒有可能把網路的參數量再更進一步壓縮?畢竟大型網路只能在雲端環境下運行,這在應用上有許多場景非常不實用,因此讓我們再來了解一下如何更進一步降低網路的參數量!

練習1:重現Inception module的推理過程

– 還記得怎樣叫出網路中間層的輸出嗎?我們可以使用這樣的方式:

library(EBImage) 
library(imager)
library(magrittr)
library(mxnet)

#Load a pre-training residual network model
inception_model <- mx.model.load("model/Inception-BN", 126)

#Resize image
resized_img <- load.image('image/1.jpg') %>% EBImage::resize(., 224, 224)
X <- array(resized_img, dim = c(224, 224, 3, 1)) * 256

#Show model architecture
all_layers <- inception_model$symbol$get.internals()
print(all_layers$outputs[grepl('output', all_layers$outputs)])
##   [1] "conv_1_output"                    "bn_1_output"                     
##   [3] "relu_1_output"                    "pool_1_output"                   
##   [5] "conv_2_red_output"                "bn_2_red_output"                 
##   [7] "relu_2_red_output"                "conv_2_output"                   
##   [9] "bn_2_output"                      "relu_2_output"                   
##  [11] "pool_2_output"                    "conv_3a_1x1_output"              
##  [13] "bn_3a_1x1_output"                 "relu_3a_1x1_output"              
##  [15] "conv_3a_3x3_reduce_output"        "bn_3a_3x3_reduce_output"         
##  [17] "relu_3a_3x3_reduce_output"        "conv_3a_3x3_output"              
##  [19] "bn_3a_3x3_output"                 "relu_3a_3x3_output"              
##  [21] "conv_3a_double_3x3_reduce_output" "bn_3a_double_3x3_reduce_output"  
##  [23] "relu_3a_double_3x3_reduce_output" "conv_3a_double_3x3_0_output"     
##  [25] "bn_3a_double_3x3_0_output"        "relu_3a_double_3x3_0_output"     
##  [27] "conv_3a_double_3x3_1_output"      "bn_3a_double_3x3_1_output"       
##  [29] "relu_3a_double_3x3_1_output"      "avg_pool_3a_pool_output"         
##  [31] "conv_3a_proj_output"              "bn_3a_proj_output"               
##  [33] "relu_3a_proj_output"              "ch_concat_3a_chconcat_output"    
##  [35] "conv_3b_1x1_output"               "bn_3b_1x1_output"                
##  [37] "relu_3b_1x1_output"               "conv_3b_3x3_reduce_output"       
##  [39] "bn_3b_3x3_reduce_output"          "relu_3b_3x3_reduce_output"       
##  [41] "conv_3b_3x3_output"               "bn_3b_3x3_output"                
##  [43] "relu_3b_3x3_output"               "conv_3b_double_3x3_reduce_output"
##  [45] "bn_3b_double_3x3_reduce_output"   "relu_3b_double_3x3_reduce_output"
##  [47] "conv_3b_double_3x3_0_output"      "bn_3b_double_3x3_0_output"       
##  [49] "relu_3b_double_3x3_0_output"      "conv_3b_double_3x3_1_output"     
##  [51] "bn_3b_double_3x3_1_output"        "relu_3b_double_3x3_1_output"     
##  [53] "avg_pool_3b_pool_output"          "conv_3b_proj_output"             
##  [55] "bn_3b_proj_output"                "relu_3b_proj_output"             
##  [57] "ch_concat_3b_chconcat_output"     "conv_3c_3x3_reduce_output"       
##  [59] "bn_3c_3x3_reduce_output"          "relu_3c_3x3_reduce_output"       
##  [61] "conv_3c_3x3_output"               "bn_3c_3x3_output"                
##  [63] "relu_3c_3x3_output"               "conv_3c_double_3x3_reduce_output"
##  [65] "bn_3c_double_3x3_reduce_output"   "relu_3c_double_3x3_reduce_output"
##  [67] "conv_3c_double_3x3_0_output"      "bn_3c_double_3x3_0_output"       
##  [69] "relu_3c_double_3x3_0_output"      "conv_3c_double_3x3_1_output"     
##  [71] "bn_3c_double_3x3_1_output"        "relu_3c_double_3x3_1_output"     
##  [73] "max_pool_3c_pool_output"          "ch_concat_3c_chconcat_output"    
##  [75] "conv_4a_1x1_output"               "bn_4a_1x1_output"                
##  [77] "relu_4a_1x1_output"               "conv_4a_3x3_reduce_output"       
##  [79] "bn_4a_3x3_reduce_output"          "relu_4a_3x3_reduce_output"       
##  [81] "conv_4a_3x3_output"               "bn_4a_3x3_output"                
##  [83] "relu_4a_3x3_output"               "conv_4a_double_3x3_reduce_output"
##  [85] "bn_4a_double_3x3_reduce_output"   "relu_4a_double_3x3_reduce_output"
##  [87] "conv_4a_double_3x3_0_output"      "bn_4a_double_3x3_0_output"       
##  [89] "relu_4a_double_3x3_0_output"      "conv_4a_double_3x3_1_output"     
##  [91] "bn_4a_double_3x3_1_output"        "relu_4a_double_3x3_1_output"     
##  [93] "avg_pool_4a_pool_output"          "conv_4a_proj_output"             
##  [95] "bn_4a_proj_output"                "relu_4a_proj_output"             
##  [97] "ch_concat_4a_chconcat_output"     "conv_4b_1x1_output"              
##  [99] "bn_4b_1x1_output"                 "relu_4b_1x1_output"              
## [101] "conv_4b_3x3_reduce_output"        "bn_4b_3x3_reduce_output"         
## [103] "relu_4b_3x3_reduce_output"        "conv_4b_3x3_output"              
## [105] "bn_4b_3x3_output"                 "relu_4b_3x3_output"              
## [107] "conv_4b_double_3x3_reduce_output" "bn_4b_double_3x3_reduce_output"  
## [109] "relu_4b_double_3x3_reduce_output" "conv_4b_double_3x3_0_output"     
## [111] "bn_4b_double_3x3_0_output"        "relu_4b_double_3x3_0_output"     
## [113] "conv_4b_double_3x3_1_output"      "bn_4b_double_3x3_1_output"       
## [115] "relu_4b_double_3x3_1_output"      "avg_pool_4b_pool_output"         
## [117] "conv_4b_proj_output"              "bn_4b_proj_output"               
## [119] "relu_4b_proj_output"              "ch_concat_4b_chconcat_output"    
## [121] "conv_4c_1x1_output"               "bn_4c_1x1_output"                
## [123] "relu_4c_1x1_output"               "conv_4c_3x3_reduce_output"       
## [125] "bn_4c_3x3_reduce_output"          "relu_4c_3x3_reduce_output"       
## [127] "conv_4c_3x3_output"               "bn_4c_3x3_output"                
## [129] "relu_4c_3x3_output"               "conv_4c_double_3x3_reduce_output"
## [131] "bn_4c_double_3x3_reduce_output"   "relu_4c_double_3x3_reduce_output"
## [133] "conv_4c_double_3x3_0_output"      "bn_4c_double_3x3_0_output"       
## [135] "relu_4c_double_3x3_0_output"      "conv_4c_double_3x3_1_output"     
## [137] "bn_4c_double_3x3_1_output"        "relu_4c_double_3x3_1_output"     
## [139] "avg_pool_4c_pool_output"          "conv_4c_proj_output"             
## [141] "bn_4c_proj_output"                "relu_4c_proj_output"             
## [143] "ch_concat_4c_chconcat_output"     "conv_4d_1x1_output"              
## [145] "bn_4d_1x1_output"                 "relu_4d_1x1_output"              
## [147] "conv_4d_3x3_reduce_output"        "bn_4d_3x3_reduce_output"         
## [149] "relu_4d_3x3_reduce_output"        "conv_4d_3x3_output"              
## [151] "bn_4d_3x3_output"                 "relu_4d_3x3_output"              
## [153] "conv_4d_double_3x3_reduce_output" "bn_4d_double_3x3_reduce_output"  
## [155] "relu_4d_double_3x3_reduce_output" "conv_4d_double_3x3_0_output"     
## [157] "bn_4d_double_3x3_0_output"        "relu_4d_double_3x3_0_output"     
## [159] "conv_4d_double_3x3_1_output"      "bn_4d_double_3x3_1_output"       
## [161] "relu_4d_double_3x3_1_output"      "avg_pool_4d_pool_output"         
## [163] "conv_4d_proj_output"              "bn_4d_proj_output"               
## [165] "relu_4d_proj_output"              "ch_concat_4d_chconcat_output"    
## [167] "conv_4e_3x3_reduce_output"        "bn_4e_3x3_reduce_output"         
## [169] "relu_4e_3x3_reduce_output"        "conv_4e_3x3_output"              
## [171] "bn_4e_3x3_output"                 "relu_4e_3x3_output"              
## [173] "conv_4e_double_3x3_reduce_output" "bn_4e_double_3x3_reduce_output"  
## [175] "relu_4e_double_3x3_reduce_output" "conv_4e_double_3x3_0_output"     
## [177] "bn_4e_double_3x3_0_output"        "relu_4e_double_3x3_0_output"     
## [179] "conv_4e_double_3x3_1_output"      "bn_4e_double_3x3_1_output"       
## [181] "relu_4e_double_3x3_1_output"      "max_pool_4e_pool_output"         
## [183] "ch_concat_4e_chconcat_output"     "conv_5a_1x1_output"              
## [185] "bn_5a_1x1_output"                 "relu_5a_1x1_output"              
## [187] "conv_5a_3x3_reduce_output"        "bn_5a_3x3_reduce_output"         
## [189] "relu_5a_3x3_reduce_output"        "conv_5a_3x3_output"              
## [191] "bn_5a_3x3_output"                 "relu_5a_3x3_output"              
## [193] "conv_5a_double_3x3_reduce_output" "bn_5a_double_3x3_reduce_output"  
## [195] "relu_5a_double_3x3_reduce_output" "conv_5a_double_3x3_0_output"     
## [197] "bn_5a_double_3x3_0_output"        "relu_5a_double_3x3_0_output"     
## [199] "conv_5a_double_3x3_1_output"      "bn_5a_double_3x3_1_output"       
## [201] "relu_5a_double_3x3_1_output"      "avg_pool_5a_pool_output"         
## [203] "conv_5a_proj_output"              "bn_5a_proj_output"               
## [205] "relu_5a_proj_output"              "ch_concat_5a_chconcat_output"    
## [207] "conv_5b_1x1_output"               "bn_5b_1x1_output"                
## [209] "relu_5b_1x1_output"               "conv_5b_3x3_reduce_output"       
## [211] "bn_5b_3x3_reduce_output"          "relu_5b_3x3_reduce_output"       
## [213] "conv_5b_3x3_output"               "bn_5b_3x3_output"                
## [215] "relu_5b_3x3_output"               "conv_5b_double_3x3_reduce_output"
## [217] "bn_5b_double_3x3_reduce_output"   "relu_5b_double_3x3_reduce_output"
## [219] "conv_5b_double_3x3_0_output"      "bn_5b_double_3x3_0_output"       
## [221] "relu_5b_double_3x3_0_output"      "conv_5b_double_3x3_1_output"     
## [223] "bn_5b_double_3x3_1_output"        "relu_5b_double_3x3_1_output"     
## [225] "max_pool_5b_pool_output"          "conv_5b_proj_output"             
## [227] "bn_5b_proj_output"                "relu_5b_proj_output"             
## [229] "ch_concat_5b_chconcat_output"     "global_pool_output"              
## [231] "flatten_output"                   "fc1_output"                      
## [233] "softmax_output"

– 為了方便我們一步一步檢驗我們的輸出是否正確,我們同時輸出conv_3a_1x1_output以及bn_3a_1x1_output測試

#Intrested outputs
pool_2_output <- which(all_layers$outputs == 'pool_2_output') %>% all_layers$get.output()
conv_3a_1x1_output <- which(all_layers$outputs == 'conv_3a_1x1_output') %>% all_layers$get.output()
bn_3a_1x1_output <- which(all_layers$outputs == 'bn_3a_1x1_output') %>% all_layers$get.output()
avg_pool_3a_pool_output <-  which(all_layers$outputs == 'avg_pool_3a_pool_output') %>% all_layers$get.output()
ch_concat_3a_chconcat_output <- which(all_layers$outputs == 'ch_concat_3a_chconcat_output') %>% all_layers$get.output()

#Needed params
my_model <- inception_model
my_model$symbol <- ch_concat_3a_chconcat_output
my_model$arg.params <- my_model$arg.params[names(my_model$arg.params) %in% names(mx.symbol.infer.shape(ch_concat_3a_chconcat_output, data = c(224, 224, 3, 7))$arg.shapes)]
my_model$aux.params <- my_model$aux.params[names(my_model$aux.params) %in% names(mx.symbol.infer.shape(ch_concat_3a_chconcat_output, data = c(224, 224, 3, 7))$aux.shapes)]

#Build executor
out <- mx.symbol.Group(c(pool_2_output, conv_3a_1x1_output, bn_3a_1x1_output, avg_pool_3a_pool_output, ch_concat_3a_chconcat_output))
executor <- mx.simple.bind(symbol = out, data = c(224, 224, 3, 1), ctx = mx.cpu())
mx.exec.update.arg.arrays(executor, my_model$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(executor, my_model$aux.params, match.name = TRUE)
mx.exec.update.arg.arrays(executor, list(data = mx.nd.array(X)), match.name = TRUE)
mx.exec.forward(executor, is.train = FALSE)
Input <- as.array(executor$ref.outputs$pool_2_output)
check1 <- as.array(executor$ref.outputs$conv_3a_1x1_output)
check2 <- as.array(executor$ref.outputs$bn_3a_1x1_output)
check3 <- as.array(executor$ref.outputs$avg_pool_3a_pool_output)
Output <- as.array(executor$ref.outputs$ch_concat_3a_chconcat_output)

練習1答案(1)

– 首先先檢查我們的卷積推理函數:

CONV_func <- function (input_array, input_weight, input_bias) {
  
  num_size <- dim(input_weight)[1]
  num_filter <- dim(input_weight)[4]
  
  if (num_size > 1) {
    
    original_dim <- dim(input_array)
    pad_size <- (num_size - 1)/2
    original_dim[1:2] <- original_dim[1:2] + pad_size * 2
    input_array_pad <- array(0, dim = original_dim)
    input_array_pad[(pad_size+1):(original_dim[1]-pad_size),(pad_size+1):(original_dim[2]-pad_size),,] <- input_array
    
  } else {
    
    input_array_pad <- input_array
    
  }
  
  out_array <- array(0, dim = c(dim(input_array)[1:2], num_filter, dim(input_array)[4]))
  
  for (l in 1:dim(out_array)[4]) {
    for (k in 1:num_filter) {
      for (j in 1:dim(out_array)[2]) {
        for (i in 1:dim(out_array)[1]) {
          out_array[i,j,k,l] <- sum(input_array_pad[i:(i+num_size-1),j:(j+num_size-1),,l] * input_weight[,,,k]) + input_bias[k]
        }
      }
    }
  }

  return(out_array)

}

my_check1 <- CONV_func(input_array = Input,
                       input_weight = as.array(my_model$arg.params$conv_3a_1x1_weight),
                       input_bias = as.array(my_model$arg.params$conv_3a_1x1_bias))

print(mean(abs(my_check1 - check1)))
## [1] 9.349769e-07

– 再檢查一下我們的批量標準化推理函數:

BN_func <- function (input_array, input_mean, input_var, input_gamma, input_beta, eps = 1e-5) {
  
  input_array_norm <- input_array
  
  for (i in 1:length(input_mean)) {
    input_array_norm[,,i,] <- (input_array[,,i,] - input_mean[i])/sqrt(input_var[i] + eps) * input_gamma[i] + input_beta[i]
  }
  
  return(input_array_norm)
  
}

my_check2 <- BN_func(input_array = my_check1,
                     input_mean = as.array(my_model$aux.params$bn_3a_1x1_moving_mean),
                     input_var = as.array(my_model$aux.params$bn_3a_1x1_moving_var),
                     input_gamma = as.array(my_model$arg.params$bn_3a_1x1_gamma),
                     input_beta = as.array(my_model$arg.params$bn_3a_1x1_beta))


print(mean(abs(my_check2 - check2)))
## [1] 2.71555e-07

– 再檢查一下我們的池化推理函數:

POOL_func <- function (input_array, pad_size = 1, num_size = 3) {
  
  original_dim <- dim(input_array)
  original_dim[1:2] <- original_dim[1:2] + pad_size * 2
  input_array_pad <- array(0, dim = original_dim)
  input_array_pad[(pad_size+1):(original_dim[1]-pad_size),(pad_size+1):(original_dim[2]-pad_size),,] <- input_array
  
  out_array <- array(0, dim = dim(input_array))
  
  for (l in 1:dim(out_array)[4]) {
    for (k in 1:dim(out_array)[3]) {
      for (j in 1:dim(out_array)[2]) {
        for (i in 1:dim(out_array)[1]) {
          out_array[i,j,k,l] <- mean(input_array_pad[i:(i+num_size-1),j:(j+num_size-1),k,l])
        }
      }
    }
  }
  
  return(out_array)
  
}

my_check3 <- POOL_func(input_array = Input)

print(mean(abs(my_check3 - check3)))
## [1] 4.729769e-08

練習1答案(2)

F9_5

print(all_layers$outputs[grepl('output', all_layers$outputs)][11:34])
##  [1] "pool_2_output"                    "conv_3a_1x1_output"              
##  [3] "bn_3a_1x1_output"                 "relu_3a_1x1_output"              
##  [5] "conv_3a_3x3_reduce_output"        "bn_3a_3x3_reduce_output"         
##  [7] "relu_3a_3x3_reduce_output"        "conv_3a_3x3_output"              
##  [9] "bn_3a_3x3_output"                 "relu_3a_3x3_output"              
## [11] "conv_3a_double_3x3_reduce_output" "bn_3a_double_3x3_reduce_output"  
## [13] "relu_3a_double_3x3_reduce_output" "conv_3a_double_3x3_0_output"     
## [15] "bn_3a_double_3x3_0_output"        "relu_3a_double_3x3_0_output"     
## [17] "conv_3a_double_3x3_1_output"      "bn_3a_double_3x3_1_output"       
## [19] "relu_3a_double_3x3_1_output"      "avg_pool_3a_pool_output"         
## [21] "conv_3a_proj_output"              "bn_3a_proj_output"               
## [23] "relu_3a_proj_output"              "ch_concat_3a_chconcat_output"
#1x1

conv_3a_1x1 <- CONV_func(input_array = Input,
                         input_weight = as.array(my_model$arg.params$conv_3a_1x1_weight),
                         input_bias = as.array(my_model$arg.params$conv_3a_1x1_bias))

bn_3a_1x1 <- BN_func(input_array = conv_3a_1x1,
                     input_mean = as.array(my_model$aux.params$bn_3a_1x1_moving_mean),
                     input_var = as.array(my_model$aux.params$bn_3a_1x1_moving_var),
                     input_gamma = as.array(my_model$arg.params$bn_3a_1x1_gamma),
                     input_beta = as.array(my_model$arg.params$bn_3a_1x1_beta))

relu_3a_1x1 <- bn_3a_1x1
relu_3a_1x1[relu_3a_1x1 < 0] <- 0

#3x3

conv_3a_3x3_reduce <- CONV_func(input_array = Input,
                                input_weight = as.array(my_model$arg.params$conv_3a_3x3_reduce_weight),
                                input_bias = as.array(my_model$arg.params$conv_3a_3x3_reduce_bias))

bn_3a_3x3_reduce <- BN_func(input_array = conv_3a_3x3_reduce,
                            input_mean = as.array(my_model$aux.params$bn_3a_3x3_reduce_moving_mean),
                            input_var = as.array(my_model$aux.params$bn_3a_3x3_reduce_moving_var),
                            input_gamma = as.array(my_model$arg.params$bn_3a_3x3_reduce_gamma),
                            input_beta = as.array(my_model$arg.params$bn_3a_3x3_reduce_beta))

relu_3a_3x3_reduce <- bn_3a_3x3_reduce
relu_3a_3x3_reduce[relu_3a_3x3_reduce < 0] <- 0

conv_3a_3x3 <- CONV_func(input_array = relu_3a_3x3_reduce,
                         input_weight = as.array(my_model$arg.params$conv_3a_3x3_weight),
                         input_bias = as.array(my_model$arg.params$conv_3a_3x3_bias))

bn_3a_3x3 <- BN_func(input_array = conv_3a_3x3,
                     input_mean = as.array(my_model$aux.params$bn_3a_3x3_moving_mean),
                     input_var = as.array(my_model$aux.params$bn_3a_3x3_moving_var),
                     input_gamma = as.array(my_model$arg.params$bn_3a_3x3_gamma),
                     input_beta = as.array(my_model$arg.params$bn_3a_3x3_beta))

relu_3a_3x3 <- bn_3a_3x3
relu_3a_3x3[relu_3a_3x3 < 0] <- 0

#5x5

conv_3a_double_3x3_reduce <- CONV_func(input_array = Input,
                                      input_weight = as.array(my_model$arg.params$conv_3a_double_3x3_reduce_weight),
                                      input_bias = as.array(my_model$arg.params$conv_3a_double_3x3_reduce_bias))

bn_3a_double_3x3_reduce <- BN_func(input_array = conv_3a_double_3x3_reduce,
                                  input_mean = as.array(my_model$aux.params$bn_3a_double_3x3_reduce_moving_mean),
                                  input_var = as.array(my_model$aux.params$bn_3a_double_3x3_reduce_moving_var),
                                  input_gamma = as.array(my_model$arg.params$bn_3a_double_3x3_reduce_gamma),
                                  input_beta = as.array(my_model$arg.params$bn_3a_double_3x3_reduce_beta))

relu_3a_double_3x3_reduce <- bn_3a_double_3x3_reduce
relu_3a_double_3x3_reduce[relu_3a_double_3x3_reduce < 0] <- 0

conv_3a_double_3x3_0 <- CONV_func(input_array = relu_3a_double_3x3_reduce,
                                  input_weight = as.array(my_model$arg.params$conv_3a_double_3x3_0_weight),
                                  input_bias = as.array(my_model$arg.params$conv_3a_double_3x3_0_bias))

bn_3a_double_3x3_0 <- BN_func(input_array = conv_3a_double_3x3_0,
                              input_mean = as.array(my_model$aux.params$bn_3a_double_3x3_0_moving_mean),
                              input_var = as.array(my_model$aux.params$bn_3a_double_3x3_0_moving_var),
                              input_gamma = as.array(my_model$arg.params$bn_3a_double_3x3_0_gamma),
                              input_beta = as.array(my_model$arg.params$bn_3a_double_3x3_0_beta))

relu_3a_double_3x3_0 <- bn_3a_double_3x3_0
relu_3a_double_3x3_0[relu_3a_double_3x3_0 < 0] <- 0

conv_3a_double_3x3_1 <- CONV_func(input_array = relu_3a_double_3x3_0,
                                  input_weight = as.array(my_model$arg.params$conv_3a_double_3x3_1_weight),
                                  input_bias = as.array(my_model$arg.params$conv_3a_double_3x3_1_bias))

bn_3a_double_3x3_1 <- BN_func(input_array = conv_3a_double_3x3_1,
                              input_mean = as.array(my_model$aux.params$bn_3a_double_3x3_1_moving_mean),
                              input_var = as.array(my_model$aux.params$bn_3a_double_3x3_1_moving_var),
                              input_gamma = as.array(my_model$arg.params$bn_3a_double_3x3_1_gamma),
                              input_beta = as.array(my_model$arg.params$bn_3a_double_3x3_1_beta))

relu_3a_double_3x3_1 <- bn_3a_double_3x3_1
relu_3a_double_3x3_1[relu_3a_double_3x3_1 < 0] <- 0

#Pool

avg_pool_3a_pool <- POOL_func(input_array = Input)

conv_3a_proj <-  CONV_func(input_array = avg_pool_3a_pool,
                           input_weight = as.array(my_model$arg.params$conv_3a_proj_weight),
                           input_bias = as.array(my_model$arg.params$conv_3a_proj_bias))

bn_3a_proj <- BN_func(input_array = conv_3a_proj,
                      input_mean = as.array(my_model$aux.params$bn_3a_proj_moving_mean),
                      input_var = as.array(my_model$aux.params$bn_3a_proj_moving_var),
                      input_gamma = as.array(my_model$arg.params$bn_3a_proj_gamma),
                      input_beta = as.array(my_model$arg.params$bn_3a_proj_beta))

relu_3a_proj <- bn_3a_proj
relu_3a_proj[relu_3a_proj < 0] <- 0

#Concat

My_output <- abind(relu_3a_1x1, relu_3a_3x3, relu_3a_double_3x3_1, relu_3a_proj, along = 3)

#Check answer

print(mean(abs(My_output - Output)))
## [1] 1.455572e-07

輕量級網路架構的演進(1)

– 而目前壓縮參數的有效方法是透過1×1卷積核,除此之外我們並沒有正式介紹過其他技巧,讓我們先從經典模型中觀察學習,下圖是幾個比較經典冠軍模型其參數量與準確度的關係圖:

F9_10

輕量級網路架構的演進(2)

– 但Inception Module討厭的地方在於它充滿的人為設計的痕跡,這也給了後續網路設計存有很大的進步空間。

– ResNext所關注的部分在於ResNet中存在最大的弱點,也就是在Module的設計上。讓我們看看他的Module設計思路:

F9_11

– 我們可以觀察到,ResNext所使用的Module是多通道的,但比起Inception Module,每個通道的設計更為一致而方便擴展。

輕量級網路架構的演進(3)

F9_12

– 這是Paper中使用的ResNet與ResNext的架構對比:

F9_13

– 這是一系列的實驗結果,我們可以發現group越多結果就越準:

F9_14

輕量級網路架構的演進(4)

– Google的研究團隊試著解決這個問題,思考過程與ResNext非常相似,同樣是從Inception Module而來,並發展出了一個比較不知名的網路:Xception net

F9_15

輕量級網路架構的演進(5)

– 為了彌補這個信息損失所造成的準確性危害,Google的研究團隊設計了一個實驗,那就是比較使用不同的active fucntion對網路準確性的影響,最終發現了不使用active fucntion具有了最高的準確性。

F9_16

輕量級網路架構的演進(6)

– 其中MobileNet v1跟Xception net的結構比較類似,只是研究的主題略有不同,而MobileNet v2就提出了一個非常大觀念上的演進,也就是我們本來都喜歡用的Bottleneck結構,因為Depthwise Separable Convolution會對信息造成損失,被修改成為Inverted Bottleneck如下:

F9_17

F9_19

F9_18

– 需要注意的是MobileNet v2在Module的設計上有參考Xception net的結果,因此Module的最後一個卷積層後面是不使用active fucntion的。

輕量級網路架構的演進(7)

– 一個可行的方案是同樣使用group convolution於1×1卷積核上,但在3×3卷積核中使用Depthwise Separable Convolution就已經造成了大量的信息損失,還擴展到1×1卷積核明顯不可行,因此這裡遇到了極大的困難。

F9_20

網路邏輯的可視化(1)

– 這個研究被發表於CVPR2016,文章是:Learning Deep Features for Discriminative Localization

F9_21

網路邏輯的可視化(2)

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("EBImage")
library(imager)
library(EBImage) 
library(magrittr)
library(mxnet)

resized_img <- load.image('image/1.jpg') %>% EBImage::resize(., 224, 224)
X <- array(resized_img, dim = c(224, 224, 3, 1)) * 256

par(mai = rep(0, 4))
plot(resized_img, axes = FALSE)

網路邏輯的可視化(3)

Dense_model <- mx.model.load('model/densenet-imagenet-169-0', 125)
label_names <- readLines('model/chinese synset.txt', encoding = 'UTF-8')

all_layers <- Dense_model$symbol$get.internals()
relu1_output <- which(all_layers$outputs == 'relu1_output') %>% all_layers$get.output()
softmax_output <- which(all_layers$outputs == 'softmax_output') %>% all_layers$get.output()
  
#Note-1: 'tail(all_layers$outputs, 20)' can be used to understand the last few layers of your network.
#Note-2: 'mx.symbol.infer.shape(relu1_output, data = c(224, 224, 3, 1))$out.shapes' can be used to understand
#        output shape of the intrested layer.
  
out <- mx.symbol.Group(c(relu1_output, softmax_output))
executor <- mx.simple.bind(symbol = out, data = c(224, 224, 3, 1), ctx = mx.cpu())
  
mx.exec.update.arg.arrays(executor, Dense_model$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(executor, Dense_model$aux.params, match.name = TRUE)
mx.exec.update.arg.arrays(executor, list(data = mx.nd.array(X)), match.name = TRUE)
mx.exec.forward(executor, is.train = FALSE)

ls(executor$ref.outputs)
## [1] "relu1_output"   "softmax_output"

網路邏輯的可視化(4)

– 我們先使用R來做:

#Get the prediction output
pred_pos <- which.max(as.array(executor$ref.outputs$softmax_output))

#Get the weights in final fully connected layer
FC_weight <- as.array(Dense_model$arg.params$fc1_weight)

#Get the feature maps from last convolution layer
feature <- as.array(executor$ref.outputs$relu1_output)

#Weithged sum of feature map: the core of class activation mapping (CAM)
for (i in 1:1664) {
  if (i == 1) {
    CAM.1 <- feature[,,i,] * FC_weight[i, pred_pos] 
  } else {
    CAM.1 <- CAM.1 + feature[,,i,] * FC_weight[i, pred_pos] 
  }
}

#Standardization from 0 to 1
CAM.1 <- (CAM.1 - min(CAM.1))/(max(CAM.1) - min(CAM.1)) 

– 這裡使用MxNet內建的函數做運算會比較快,並且程式也會簡潔很多,讓我們試試看:

#Get the prediction output
pred_pos <- which.max(as.array(executor$ref.outputs$softmax_output))

#Get the weights in final fully connected layer
FC_weight <- as.array(Dense_model$arg.params$fc1_weight)
FC_weight <- array(FC_weight[,pred_pos], dim = c(1, 1, 1664, 1)) %>% mx.nd.array()

#Weithged sum of feature map: the core of class activation mapping (CAM)
CAM.2 <- mx.nd.broadcast.mul(lhs = executor$ref.outputs$relu1_output, rhs = FC_weight)
CAM.2 <- mx.nd.sum(data = CAM.2, axis = 0:1) %>% as.array()

#Standardization from 0 to 1
CAM.2 <- (CAM.2 - min(CAM.2))/(max(CAM.2) - min(CAM.2)) 

#Check the diffirence
all.equal(CAM.1, CAM.2)
## [1] "Mean relative difference: 3.305733e-07"

網路邏輯的可視化(5)

par(mai = rep(0, 4))
img <- resized_img %>% grayscale()
plot(img, axes = FALSE)

#Define the color
  
cols <- colorRampPalette(c("#000099", "#00FEFF", "#45FE4F", 
                           "#FCFF00", "#FF9400", "#FF3100"))(256)

#Enlarge the class activation mapping (7*7 to 224*224)

CAM <- EBImage::resize(CAM.2, 224, 224)
CAM <- round(CAM*255)+1 %>% as.integer()
FINAL_CAM <- cols[CAM] %>% paste0(., "80") %>% matrix(., 224, 224, byrow = TRUE) %>% .[224:1,] %>% as.raster()

#Visualization
plot(FINAL_CAM, add = TRUE)
  
#Show the prediction output
obj <- label_names[pred_pos]
legend('bottomright', paste0(substr(obj, 11, nchar(obj)), ' (prob = ', formatC(as.array(executor$ref.outputs$softmax_output)[pred_pos], 3, format = 'f'), ')'), bg = 'gray90')

練習2:利用其他模型進行預測

– 如果你忘了在哪下載,請到Model zoo下載

#Load a pre-training residual network model
res_model <- mx.model.load("model/resnet-18", 0)

#Display image
resized_img <- load.image('image/1.jpg') %>% EBImage::resize(., 224, 224)
X <- array(resized_img, dim = c(224, 224, 3, 1)) * 256

par(mai = rep(0, 4))
plot(resized_img, axes = FALSE)

#Predict
prob <- predict(res_model, X = X, ctx = mx.cpu())
cat(paste0(label_names[which.max(prob)], ': ', formatC(max(prob), 4, format = 'f'), '\n'))
## n01484850 大白鯊: 0.9435

練習2答案(畫圖)

all_layers <- res_model$symbol$get.internals()
relu1_output <- which(all_layers$outputs == 'relu1_output') %>% all_layers$get.output()
softmax_output <- which(all_layers$outputs == 'softmax_output') %>% all_layers$get.output()

out <- mx.symbol.Group(c(relu1_output, softmax_output))
executor <- mx.simple.bind(symbol = out, data = c(224, 224, 3, 1), ctx = mx.cpu())
  
mx.exec.update.arg.arrays(executor, res_model$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(executor, res_model$aux.params, match.name = TRUE)
mx.exec.update.arg.arrays(executor, list(data = mx.nd.array(X)), match.name = TRUE)
mx.exec.forward(executor, is.train = FALSE)
#Get the prediction output
pred_pos <- which.max(as.array(executor$ref.outputs$softmax_output))

#Get the weights in final fully connected layer
FC_weight <- as.array(res_model$arg.params$fc1_weight)
FC_weight <- array(FC_weight[,pred_pos], dim = c(1, 1, 512, 1)) %>% mx.nd.array()

#Weithged sum of feature map: the core of class activation mapping (CAM)
CAM.2 <- mx.nd.broadcast.mul(lhs = executor$ref.outputs$relu1_output, rhs = FC_weight)
CAM.2 <- mx.nd.sum(data = CAM.2, axis = 0:1) %>% as.array()

#Standardization from 0 to 1
CAM.2 <- (CAM.2 - min(CAM.2))/(max(CAM.2) - min(CAM.2)) 
par(mai = rep(0, 4))
img <- resized_img %>% grayscale() %>% EBImage::resize(., 224, 224)
plot(img, axes = FALSE)

#Define the color
  
cols <- colorRampPalette(c("#000099", "#00FEFF", "#45FE4F", 
                           "#FCFF00", "#FF9400", "#FF3100"))(256)

#Enlarge the class activation mapping (7*7 to 224*224)

CAM <- EBImage::resize(CAM.2, 224, 224)
CAM <- round(CAM*255)+1 %>% as.integer()
FINAL_CAM <- cols[CAM] %>% paste0(., "80") %>% matrix(., 224, 224, byrow = TRUE) %>% .[224:1,] %>% as.raster()

#Visualization
plot(FINAL_CAM, add = TRUE)
  
#Show the prediction output
obj <- label_names[pred_pos]
legend('bottomright', paste0(substr(obj, 11, nchar(obj)), ' (prob = ', formatC(as.array(executor$ref.outputs$softmax_output)[pred_pos], 3, format = 'f'), ')'), bg = 'gray90')

練習2引申:為什麼Inception Net需要先減去平均之後才能預測的準?

#Load a pre-training residual network model
inception_model <- mx.model.load("model/Inception-BN", 126)

#Display image
resized_img <- load.image('image/1.jpg') %>% EBImage::resize(., 224, 224)
X <- array(resized_img, dim = c(224, 224, 3, 1)) * 256

par(mai = rep(0, 4))
plot(resized_img, axes = FALSE)

#Predict
prob <- predict(inception_model, X = X, ctx = mx.cpu())
cat(paste0(label_names[which.max(prob)], ': ', formatC(max(prob), 4, format = 'f'), '\n'))
## n04251144 潛水通氣管: 0.9066
#Load a pre-training residual network model
inception_model <- mx.model.load("model/Inception-BN", 126)

#resize image
resized_img <- load.image('image/1.jpg') %>% EBImage::resize(., 224, 224)
X <- array(resized_img, dim = c(224, 224, 3, 1)) * 256
X <- X - 128 #Important part for Inception-BN model

#Predict
prob <- predict(inception_model, X = X, ctx = mx.cpu())
cat(paste0(label_names[which.max(prob)], ': ', formatC(max(prob), 4, format = 'f'), '\n'))
## n01484850 大白鯊: 0.9080

練習2引申的參考答案(1)

– 這是InceptionNet的結果:

inception_model <- mx.model.load("model/Inception-BN", 126)
all_layers <- inception_model$symbol$get.internals()
head(all_layers$outputs, 10)
##  [1] "data"             "conv_1_weight"    "conv_1_bias"     
##  [4] "conv_1_output"    "bn_1_gamma"       "bn_1_beta"       
##  [7] "bn_1_moving_mean" "bn_1_moving_var"  "bn_1_output"     
## [10] "relu_1_output"

– 這是ResNet的結果:

res_model <- mx.model.load("model/resnet-18", 0)
all_layers <- res_model$symbol$get.internals()
head(all_layers$outputs, 10)
##  [1] "data"                "bn_data_gamma"       "bn_data_beta"       
##  [4] "bn_data_moving_mean" "bn_data_moving_var"  "bn_data_output"     
##  [7] "conv0_weight"        "conv0_output"        "bn0_gamma"          
## [10] "bn0_beta"

– 這是DenseNet的結果:

Dense_model <- mx.model.load('model/densenet-imagenet-169-0', 125)
all_layers <- Dense_model$symbol$get.internals()
head(all_layers$outputs, 10)
##  [1] "data"                "bn_data_gamma"       "bn_data_beta"       
##  [4] "bn_data_moving_mean" "bn_data_moving_var"  "bn_data_output"     
##  [7] "conv0_weight"        "conv0_output"        "bn0_gamma"          
## [10] "bn0_beta"

練習2引申的參考答案(2)

– 這是ResNet的前處理過程:

res_model$aux.params$bn_data_moving_mean
## [1] 123.3275 115.4906 102.1266
res_model$aux.params$bn_data_moving_var
## [1] 4926.562 4650.233 5043.490
res_model$arg.params$bn_data_gamma
## [1] 0.9999999 0.9999999 0.9999999
res_model$arg.params$bn_data_beta
## [1]  0.31789371  0.18047760 -0.06976888

– 這是DenseNet的前處理過程:

Dense_model$aux.params$bn_data_moving_mean
## [1] 122.6735 114.9864 101.6775
Dense_model$aux.params$bn_data_moving_var
## [1] 4838.999 4547.774 4932.720
Dense_model$arg.params$bn_data_gamma
## [1] 2.536833e-12 2.536833e-12 2.536833e-12
Dense_model$arg.params$bn_data_beta
## [1] 0.62945306 0.39186531 0.07582176

結語

– Model Architecture的設計觀念主要導向稀疏聯結(group convolution),這與我們的神經聯接方式較為類似,因此這種結構可能較為高效

– 可視化的部分這讓我們了解到,其實整個卷積神經網路的運算過程並不像之前大家所認為的黑箱決策,而這點非常的重要,讓我們有機會能夠運用這些位置資訊來設計進一步的「物件分割」以及「物件識別」模型。